{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "73bd968b-d970-4a05-94ef-4e7abf990827",
   "metadata": {},
   "source": [
    "Chapter 02\n",
    "\n",
    "# 余弦距离\n",
    "Book_4《矩阵力量》 | 鸢尾花书：从加减乘除到机器学习 (第二版)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a4d2d126-5db0-4038-a1dc-9a83708d2220",
   "metadata": {},
   "source": [
    "此代码加载了鸢尾花数据集，并提取了数据集中的几个特定数据点以计算它们之间的余弦距离和夹角余弦值。选取的特征包含所有特征，而数据点分别为第1、2、51、101个观测。\n",
    "\n",
    "### 计算公式\n",
    "代码使用余弦距离和余弦相似度来度量两个向量之间的相似性。\n",
    "\n",
    "1. 余弦距离公式：\n",
    "\n",
    "$$\n",
    "\\text{cosine\\_distance} = 1 - \\frac{x_1 \\cdot x_2}{\\|x_1\\| \\|x_2\\|}\n",
    "$$\n",
    "\n",
    "2. 向量间夹角的余弦值：\n",
    "\n",
    "$$\n",
    "\\cos \\theta = \\frac{x_1 \\cdot x_2}{\\|x_1\\| \\|x_2\\|}\n",
    "$$\n",
    "\n",
    "余弦距离反映了两个向量间的角度差异，而夹角余弦值则用于计算向量的相似性。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6aec7dfe-a3d4-4cb9-b921-b3a137455aca",
   "metadata": {},
   "source": [
    "## 导入所需库"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "83ff9c7b-6adb-4fe1-91ef-4325850851b2",
   "metadata": {},
   "outputs": [],
   "source": [
    "from scipy.spatial import distance  # 导入SciPy库，用于计算向量间的距离\n",
    "from sklearn import datasets  # 导入sklearn数据集模块\n",
    "import numpy as np  # 导入NumPy库，用于数值计算"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4723d4a7-92b7-4c8d-8dcc-e678a7daded0",
   "metadata": {},
   "source": [
    "## 导入鸢尾花数据集"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "f4654f70-41cf-40b9-ad1d-5e39c337a40a",
   "metadata": {},
   "outputs": [],
   "source": [
    "iris = datasets.load_iris()  # 加载鸢尾花数据集"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a7d90350-e6b8-4f35-a30f-f17e3042bbfa",
   "metadata": {},
   "source": [
    "## 使用前两个特征：萼片长度和萼片宽度"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "111624e1-5071-43b5-ad7f-f91c1ea86f1f",
   "metadata": {},
   "outputs": [],
   "source": [
    "X = iris.data[:, :]  # 提取所有特征数据"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "08b685c0-3e21-4c99-8f19-d15fc50ba309",
   "metadata": {},
   "source": [
    "## 提取4个数据点"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "af961fce-c345-4d97-a8d8-dc1fd27f75e6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([5.1, 3.5, 1.4, 0.2])"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x1_data = X[0, :]  # 第一个数据点\n",
    "x1_data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "98593e09-12b0-43a1-81c0-97751c3f44cf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([4.9, 3. , 1.4, 0.2])"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x2_data = X[1, :]  # 第二个数据点\n",
    "x2_data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "261efd1f-06bc-4361-996b-62a2f621234a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([7. , 3.2, 4.7, 1.4])"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x51_data = X[50, :]  # 第51个数据点\n",
    "x51_data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "637692dd-8df7-472c-97e5-c1be1475c891",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([6.3, 3.3, 6. , 2.5])"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x101_data = X[100, :]  # 第101个数据点\n",
    "x101_data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a14578f6-7866-4fea-8354-2633c0a50df7",
   "metadata": {},
   "source": [
    "## 计算余弦距离和夹角余弦值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "d73204a7-6235-4449-a3ea-90d677c04fe6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.0014208364959781283"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x1_x2_cos_dist = distance.cosine(x1_data, x2_data) \n",
    "# 计算x1和x2的余弦距离\n",
    "x1_x2_cos_dist"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "1b3df27e-df51-4564-a192-b3d0544fd820",
   "metadata": {},
   "outputs": [],
   "source": [
    "x1_norm = np.linalg.norm(x1_data)  # 计算x1的L2范数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "efb9a6c3-ccfa-4000-8135-677fdd68c81d",
   "metadata": {},
   "outputs": [],
   "source": [
    "x2_norm = np.linalg.norm(x2_data)  # 计算x2的L2范数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "06c48d91-451f-404c-8e11-6006943c4a38",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "37.489999999999995"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x1_dot_x2 = x1_data.T @ x2_data  # 计算x1和x2的点积\n",
    "x1_dot_x2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "5818da3f-985c-4c48-8fc9-159ec8067f2e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9985791635040218"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x1_x2_cos = x1_dot_x2 / x1_norm / x2_norm  # 计算x1和x2的夹角余弦值\n",
    "x1_x2_cos"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "f3e0969d-ffbf-4c10-95ee-69898c57dc0c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.0014208364959782394"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "1 - x1_x2_cos"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "62c0e4a9-c455-4290-9389-a2ee56c2adc1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.0014208364959781283"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x1_x2_cos_dist = distance.cosine(x1_data, x2_data)  # 计算x1和x2的余弦距离\n",
    "x1_x2_cos_dist"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "46e0f1db-9989-4908-89de-ffe0f0a8f431",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.07161964128508791"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x1_x51_cos_dist = distance.cosine(x1_data, x51_data)  # 计算x1和x51的余弦距离\n",
    "x1_x51_cos_dist"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "0f516143-46a8-42ba-8e31-fe4b08d428de",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.1399186683412712"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x1_x101_cos_dist = distance.cosine(x1_data, x101_data)  # 计算x1和x101的余弦距离\n",
    "x1_x101_cos_dist"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "85a80909-2aac-49ed-bb7a-f8cc6b80ee7d",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ecd322f4-f919-4be2-adc3-69d28ef25e69",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
